**********************************************************************
*	REPLICATION CODE
*
*	"Robust inequality of opportunity comparisons: 
*	Theory and application to early-childhood policy evaluation" 
*
*	The Review of Economics and Statistics
*
*	Francesco Andreoli (LISER)
*	Tarjei Havnes (University of Oslo) 
*	Arnaud Lefranc (University of Cergy-Pontoise)
*
*	March 2018
**********************************************************************

This file illustrates the replication code used in the paper. Code can be broadly organized in two sections:

	ESTIMATION: 	Stata (optimized for version 14.0) code for producing estimates of RIF regression estimates of QTE of Norwegian Kindergarten Act reform evaluation
			based on RIF estimators.

	INFERENCE:	Stata (optimized for version 14.0) code and output implementing dominance and EZOP tests (see also "Implementation algorithm" in the Online Appendix to the paper)
			Code uses QTE and order statistics from ESTIMATION to produce figures and tables presented in the main text.
 

ESTIMATION :
************
	a) -ezopboot.do-	Produce RIF DiD estimators for the main effects. Estimates are made conditional on family background (parental income deciles). Recalls basic functions from the -ezopprogrs.do- file. Produced bootstrapped estimates to be used by commands in INFERENCE forlder.
				Estimates based on register data available at Statistics Norway. 
				Main variables described below
					- far_aarutd16 - Fathers years of completed education
					- mor_aarutd16 - Mothers years of completed education
					- faminc_bhmean - Average family income when child is in child care age
					- treat - Dummy for treatment municipality
					- post - Dummy for post-reform cohort
					- wy0609 - Child income in 2006-2009
					- bornYYYY - Dummy for being born in year YYY
					- ccov36 - Child care coverage rate for 3-6 year olds in municipality when child is in child care age
								 
				
				* Raw data are NOT provided due to confidentiality reasons (see "Data availability statement"). 
				  Inquiries about data access should be addressed to: Labour Market Section, Statistics Norway, PB 8131 Dep, 0033 Oslo, Norway.


	b) -ezopprogs.do-	Includes mata routines implementing RIF DiD estimators.


INFERENCE :
***********
	
	a) DO FILES: 
	-------------

	a.1) -ezop_test_shell.do-	First file to be opened, it runs sequentially other do-files that reproduce figures and tables, starting from baseline model estimates.
	
	a.2) -settings.do-		Produce estimates of pairwise comparisons of QTE across percentiles of child earnings distribution and deciles of parents income distribution.
					- Additional commands: recalls mata routines in program files -tests_bootstrap99q_QTEcomparisons.do- and -tests_ezop_bootstrap99_QTE_DISTRIBUTION_02.do-.
					- Output:	 -ezop_c.png- (Figure 3 in the paper)

	a.3) -graphs_joint_test.do-	Produce estimates of p-values for joint significance tests of dominance in cdfs (rank circumstancses groups) and gap curves based on pairwise QTE estimtes comparisons (ezop test),
					for centiles of the children earnings distribution, by decile of parental income.
					- Additional commands: recalls mata routines in program files -tests_bootstrap99q_QTEcomparisons.do- and -tests_ezop_bootstrap99_QTE_DISTRIBUTION_02.do-
 					- Output: 	-joint_test_F0nol.png-, -joint_test_F1nol.png-, -joint_test_QTEnol.png- (respectively panels A), B) and C) of figure 2 in the text).
	
	a.4) -table_graph_3groups.do-	Produce estimates of CI for cdfs and gap curves of children earning distribution conditional on selected deciles (D1, D5, D9) of parental income.
					CI can be used to perform disjoint test of diminance. Also reports table of results for joint dominance tests (for selected parental income decile groups).
					- Additional commands: recalls mata routines in program file -tests_ezop_bootstrap99_QTE_DISTRIBUTION_3groups.do-. 
					- Output: 	-gQ0.png, gQ.png-, -gQTE.png-, -gGAP12CI.png-, -gGAP13CI.png-, -gGAP23CI.png- (respectively panels A), B), C), D), E) and F) of Figure 1 in the text), 
					  		-tab_ezop_5(5)95_QTE.tex- (table for joint test statistics for selected deciles of parental income, gives table 1 in the text). 	
	
	a.5) -gini_opp.do-		Produce estimates of Gini IO indices. The program produce boostrapped SE for Gini IO indices and for differences in Gini IO across policy regimes.
					Results are reported as Stata output (see paper).

	
	b) PROGRAMS:
	-------------
	
	b.1) -tests_bootstrap99q_QTEcomparisons.do-	Stata routines implementing joint pairwise tests of dominance in cdf and in gap curves.
							These files are embedded within the main code. 

	b.2) -tests_ezop_bootstrap99_QTE_DISTRIBUTION_02.do- Define the "test_ezop" routine. The output of the routine are p-value estimates for various null hzypotheses underlying the EZOP test.

	b.3) -tests_ezop_bootstrap99_QTE_DISTRIBUTION_3groups.do- Repeat joint stochastic dominance tests for the case of three deciles.

	b.4) -ezop_06.do-		Include relevant mata fucntion for performing joint stochastic dominance tests. See b.2) and b.3).
	     -ezop_07.do-


	c) DATA	Folder gathering databases of RIF and quantiles estimates for actual children earnings cdfs. 
	-------	Estimates are always reported for each of the 300 bootstrapped samples.
		Data codebook is as follows:

	c.1) -ezop_estimates.dta- 	contains the main estimates for 19 quantiles (5, 10, ..., 95) and 99 percentiles of the distribution of mean market income from wages and self-employment over 2006�2009 . Each estimation is run separately in 4 and 8 distinct subsamples:
					- ez_ie distinguishes between family income and family education above and below median. Family income is calculated as the mean income of both parents in the years the child was 3 to 6 years old, measured in 2006-NOK. Family education is calculated as the maximum of mothers and fathers education. 
					- incmed-and-educmedequal 1 (2) for values below (above) the median.
					- ez_ieg distinguishes each of the former 4 classes by gender of the child, for a total of 8 classes. 
					- gender- equals 1 for boys and 0 for girls.
					- the -treat- and -post- variables are defined as in Havnes and Mogstad (2015, cited in the paper), using the 1975reform. Estimates in the dta-file are DID-estimates on the interaction between -post and -treat-. Standard errors are calculated separately for each quantile. The confidence interval has tended to narrow slightly when bootstrapping.

	c.2) -sumstats_*.dta- 		contain summary statistics for a number of variables (see below).
					- Summary stats calculated are mean sd p5 p10 p25 p50 p75 p90 p95
					- groups can be identified from the mean of -incmed-, -educmed-, -gender-, and -treat- in each file. Each file also includes a line for the overall mean, which can be identified from the non-integer value of -incmed-, -educmed-, -gender-, and/or -treat

	c.3) -sumstats_ie.dta- 		"i" indicates income, "e" indicates education, so these are for the four income by education groups

	c.4) -sumstats_iet.dta- 	"t" indicates treatment, so these are for eight income, by education, by treatment-groups

	c.5) -sumstats_ieg.dta-
	     -sumstats_iegt.dta- 	"g" indicates gender, so these are as above, but also by gender
					- Variables included are varname description:
						- faminc_bhmean		Family income 
						- fameduc 		Family education 
						- wy0609 		Market income, mean 2006�2009 
						- aarutd2006 		Years of education, 2006 
						- immigrant 		Immigrant 
						- birthorder 		Birth order 
						- incmed 		1 if below median family income, 2 if above 
						- educmed 		1 if below median family education, 2 if above 
						- gender 		1 if boy, 0 if girl 
						- treat 		1 if from treatment municipality, 0 if not

	c.6) -intg_distwy0609faminc.dta-  gives the distribution of family income and child income for all post-by-treat-groups, and for several variants of groups by family income
					  The variables intgXY_pPtT[VAR] and intgXY_pPtT[VAR]_pdf give the quantiles and pdf of� [VAR] for the group � with post = P� with treat = T� with family income between decile X and Y
					  E.g. � "intg46_p1t1wy0609" gives the distribution of wy0609 for children from treatment municipalities and post�cohorts, where faminc_bhmean was between the 40th and the 60th percentile.

	c.7) -eop_bootfile_intg_rep.dta-  gives the raw estimates of coefficinetns for regression model (1) in the paper, along with informaton on densities and parental income needed to compute QTE at precise points of the children earning distributions.
				          - The prefix -intgK_p1t1- refers to percentiles and density for group K. 
					  - The prefix -intgPK- refers to estimates for group K, with a P�th order polynomial in family income. 
					  - The variable famincP is the Pth order exponential of faminc_bhmean36faminc^P (Regressions use a 4th order polynomial of family income). 
					  - The prefix pt_ indicates an interaction with post*treat (the DD estimate)

					  E.g. � "intg32pt_faminc1" is the estimated coefficient for the interaction of post, treat and faminc^1 for the 3rd income group (p50�p90) in a specification using a 2nd�order polynomial in family income
	
				  	  - New variables
						- quantiles
						- int_p1t1famincP: fathers incomes percentiles for the whole population. These quantiles can be kept fixedfor each replication, since the father distribution is supposed to be known. Hence estimation consists in reproducing the same 100 estimates at every iteration. These incomes arefaminc^P
						- integPpt: this is the coeff ofpost x treatmentfor a given percentile in a model where P-th order polynomsof the father income are considered
						- integPpt_famincG: this is the interaction of the coefficient oftreatment x posttimes the coefficient of thefaminc^Gvariable (for G=1,...P) out of a model where we estimate the P-th order polynomial of father income.
						- steps give the earnings level that defines the y-variable (all quantiles of the child earnings distribution conditional on family earnings deciles).
						- p,group give the percentile and group that steps is defined from
				    		- intgpdf, intgpdf_F gives a kernel estimate of the child earnings distribution and the empirical cdf.
						- intgpdfX, intgpdfX_F gives the same for each decile of the family earnings distribution.
						- intg4_pt_XXX gives the estimated DD-effects in the QTE-model. 
						- faminc1-faminc4 indicate the polynomial in family income.
						- intg4_pt_XXX_ols gives the estimated DD-effects in the OLS-model (linear in child earnings) 
						- repl gives the replication number. repl == 0 indicates the main sample, repl > 0 indicates bootstrap samples (300 replications).  	


	d) GRAPHS
	----------

	d.1) -gQ0.pdf-			Figure 1, panel A.

	d.2) -gQ.pdf-			Figure 1, panel C.

	d.3) -gQTE.pdf-			Figure 1, panel B.
	
	d.4) -gGAP12CI.pdf-		Figure 1, panel D.

	d.5) -gGAP13CI.png-		Figure 1, panel E.

	d.6) -gGAP23CI.png-		Figure 1, panel F.

	d.7) -tab_ezop_5(5)95_QTE.tex-	Table 1, coplete.

	d.8) -joint_test_F0nol.pdf-	Figure 2, panel A.

	d.9) -joint_test_F1nol.pdf-	Figure 2, panel B.

	
	d.10) -joint_test_QTEnol.pdf-	Figure 2, panel C.

	d.11) -ezop_c.pdf-		Figure 3.


	